TR-2009001: BISC: A Binary Itemset Support Counting Approach towards Efficient Frequent Itemset Mining
نویسندگان
چکیده
the performance of a depth-first Frequent Itemset Miming (FIM) algorithm is closely related to the total number of recursions which can be modeled as O(n), where k is the maximal recursion depth and n is the branching factor. Many existing approaches focus more on improving support counting rather than on decreasing n and k, which may lead to unsatisfactory performance as they grow. In this paper a novel approach, Binary Itemset Support Counting (BISC), is presented to address these two factors. Let the direct support of an itemset I be the number of transactions with the same itemset as I, BISC can derive the supports of all the itemsets in a database by iteratively updating their direct supports, thus eliminating the need for further recursion. BISC converts a database into its binary representation and combines one-stage BISC and two-stage BISC to minimize the cost of support updating and memory consumption by eliminating redundant updating operations. By applying BISC with the basic projection technique, our approach can significantly decrease the maximum depth and branching factor of database projection, thus improving both the time and space efficiency for FIM. In terms of time efficiency, experiments show that BISC outperforms all the other algorithms (in many cases by almost an order of magnitude or more) in the datasets tested. Even though this does not guarantee that BISC will always perform the best, the result is impressive given the fact that most existing algorithms are only efficient in some types of datasets. The memory usage of BISC is comparable to (in most cases smaller than) those of the other algorithms. In summary, the concepts of direct support, binary representation, multi-stage BISC, and the optimization strategies applied in BISC represent a promising approach to related areas. 1 The software for this algorithm is available at http://alpha.cs.qc.edu/research.html
منابع مشابه
BISC: a Binary Itemset Support Counting Approach towards Efficient Frequent Itemset Mining
the performance of a depth-first Frequent Itemset Miming (FIM) algorithm is closely related to the total number of recursions which can be modeled as O(n), where k is the maximal recursion depth and n is the branching factor. Many existing approaches focus more on improving support counting rather than on decreasing n and k, which may lead to unsatisfactory performance as they grow. In this pap...
متن کاملFast Algorithms for Mining Interesting Frequent Itemsets without Minimum Support
Real world datasets are sparse, dirty and contain hundreds of items. In such situations, discovering interesting rules (results) using traditional frequent itemset mining approach by specifying a user defined input support threshold is not appropriate. Since without any domain knowledge, setting support threshold small or large can output nothing or a large number of redundant uninteresting res...
متن کاملConcurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm
Discovery of frequent itemsets is a very important data mining problem with numerous applications. Frequent itemset mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. A significant amount of research on frequent itemset mining has been done so far, focusing mainly on developing faster complete mining al...
متن کاملMining Frequent Sequences Using Itemset-Based Extension
In this paper, we systematically explore an itemset-based extension approach for generating candidate sequence which contributes to a better and more straightforward search space traversal performance than traditional item-based extension approach. Based on this candidate generation approach, we present FINDER, a novel algorithm for discovering the set of all frequent sequences. FINDER is compo...
متن کاملRamp: High Performance Frequent Itemset Mining with Efficient Bit-Vector Projection Technique
Mining frequent itemset using bit-vector representation approach is very efficient for small dense datasets, but highly inefficient for sparse datasets due to lack of any efficient bit-vector projection technique. In this paper we present a novel efficient bit-vector projection technique, for sparse and dense datasets. We also present a new frequent itemset mining algorithm Ramp (Real Algorithm...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016